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ABSTRACT 



According to some researchers canonical correlation results 
should be interpreted in part by consulting redundancy coefficients (Rd) . 

This paper, however, makes the case that Rd coefficients generally should not 
be interpreted. Rd coefficients are not multivariate. Furthermore, it makes 
little sense to interpret coefficients not optimized as part of an analysis. 

A small heuristic data set using the Statistical Package for the Social 
Sciences is used to illustrate Rd. If the researcher's primary interest is to 
explore relationships between synthetic variable sets, then canonical 
correlation analysis should be used, and the interpretation should focus on 
the multivariate variance-accounted- for effect size, the standardized 
function coefficients, and the structure coefficients. (Contains 3 tables and 
32 references.) (Author/SLD) 
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ABSTRACT 

Some researchers argue that canonical correlation results should be 
interpreted in part by consulting redundancy coefficients (Rd) . 
This paper, however, makes the case that Rd coefficients generally 
should not be interpreted. Redundancy coefficients are not 
multivariate. Furthermore, it makes little sense to interpret 
coefficients not optimized as part of an analysis. 
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Canonical correlation analysis (CCA) is an analytic method 
that can be employed to investigate relationships among two or more 
variable sets (Horst, 1961). Typically each variable set itself 
consists of at least two variables (otherwise the canonical 
analysis is typically called something else, such as a t-test or a 
regression analysis) . Canonical analysis was first conceptualized 
by Hotelling (1935) more than 60 years ago. 

Although canonical correlation analysis has a long history, as 
Krus, Reynolds and Krus (1976, p. 725) noted, "Dormant for nearly 
half a century, Hotelling's (1935) canonical variate analysis has 
come of age. The principal reason behind its resurrection was its 
computerization and inclusion in major statistical packages." Of 
course, empirical studies (Emmons, Stallings & Layne, 1990) show 
that, "In the last 20 years, the use of multivariate statistics has 
become commonplace" (Grimm & Yarnold, 1995, p. vii) . 

There are two reasons why multivariate methods are being used 
with increasing frequency. First, multivariate methods control the 
inflation of experimentwise Type I error rates (a^x PERIMENXWISE ) that 
can occur when several univariate tests are conducted with a single 
sample's data, even when the testwise error rate ( ^testwise) is very 
small (Thompson, 1994) . Second, multivariate methods, such as 
canonical correlation analysis, best honor the nature of the 
reality that most of us want to study, because most of us believe 
we live in a reality where most effects have multiple causes and 
most causes have multiple effects (cf. Tatsuoka, 1973, p. 273). 

Canonical analysis is also important to understand for 
conceptual reasons. Canonical correlation analysis is the most 
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Redundancy Coefficients -4- 
general case of the classical General Linear Model (GLM; Fan, 1996, 
1997; Knapp, 1978; Thompson, 1991, in press) . Canonical analysis 
subsumes all classical analytic methods (e.g., t-tests, ANOVA, 
ANCOVA, r, R, MANOVA, DDA) , all of which are correlational 
analyses, as special cases. 

Thus, canonical correlation analysis has been used in a 
variety of published research. Wood and Erskine (1976) and 
Thompson (1989) provided extensive bibliographies of applications 
of canonical correlation analysis. Example applications include 
those reported by Chastain and Joe (1987), Dunst and Trivette 
(1988) , Estabrook (1984) , Fowler and Macciocchi (1986) , Fuqua, 
Seaworth and Newman (1987), Pitts and Thompson (1984), and Zakaahi 
and Duran (1982) . One particularly interesting application 
involves studies of multivariate test-retest score reliability or 
of multivariate criterion-related score validity (cf. Sexton, 
McLean, Boyd, Thompson & McCormick, 1988) . 

The purpose of the present work is to review one index that 
may be used in interpreting CCA results — the redundancy coefficient 
(Rd) . First, redundancy coefficients will be explained. Then the 
pluses and minuses of using Rd coefficients within a canonical 
analysis will be detailed. 

Synthetic vs Measured Variables; The Origins of Rd 

All analyses invoke weights (e.g., standardized canonical 
function coefficients, regression beta weights, DDA standardized 
discriminant function coefficients, factor pattern coefficients) 
that are then applied to the variables that are directly measured 
or observed in a study to obtain scores on the so-called synthetic 
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or latent variables (e.g., the regression Y scores, factor scores) 
that are actually the focus of all analyses (Thompson, 1998) . In 
canonical analysis, these synthetic variables are called canonical 
function or canonical variate scores (see Thompson (1991) , for 
further explanation) . 

In 1968, Douglas Stewart and William Love attempted to provide 
some answers to questions concerning the interpretation of results 
from a canonical correlation analysis (CCA) . Although they found 
CCA to be very helpful in correlating the scores on the synthetic 
variables within a canonical analysis, they noted that "relatively 
strong canonical correlation (s) may obtain between two linear 
functions, even though these linear functions may not extract 
[statistically] significant portions of variance from their 
respective batteries [i.e. , the scores on the measured va riables in 
the analysis]" (Stewart & Love, 1968, p. 160). Stevens also noted 
that the squared canonical correlation only tells the researcher 
"the amount of variance that the two canonical variates [i.e., 
synthetic variable scores] share and does not necessarily indicate 
considerable variance overlap between the two sets of [measured] 
variables" (1996, p. 441). 

In order to overcome this perceived problem so as to 
facilitate the interpretation of CCA results, Stewart and Love 
(1968) conceptualized statistics to measure what they termed the 
redundancy index (Rd) . Miller, independently, in 1975, then 
developed a partial test distribution using a Monte Carlo study to 
test the statistical significance of Stewart and Love's redundancy 
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index. Stewart and Love represented the redundancy index as being 
a measure of the proportion of "variance of C (the criterion set of 
variables) predictable from P (the predictor set of variables), or 
the redundancy in C given P" (1968, p. 161). 

It is important to note that the canonical correlation 
coefficient (Rc) is a "symmetric" measure of the relationship 
between the synthetic variable scores on a given canonical function 
(Tatsuoka, 1973). That is, on Function I if the correlation (Rc) 
between the canonical function scores for the predictor variable 
set and the canonical function scores for the criterion variable 
set is .5, of course the correlation (Rc) between the canonical 
function scores for the criterion variable set and the canonical 
function scores for the predictor variable set is also exactly .5. 

However, redundancy coefficients are not necessarily 
symmetric, and, in fact, are almost never exactly symmetric. That 
is, on a given canonical function, the Rd coefficient for the 
criterion variable set might be 25%, while on the same function the 
Rd coefficient for the predictor variable set might be 66%. Or, on 
a given function, the Rd coefficient for one variable set might be 
5%, and for the other variable set on the same function, the Rd 
coefficient might be 9%. Stewart and Love (1968) argued that this 
non-symmetry was desirable. 

Computation of Redundancy (Rd^ Coefficients 
Detailed explanations of the computation of the redundancy 
index (Rd) are given in Stewart and Love (1968), Cooley and Lohnes 
(1971), Miller (1975), Stevens (1996), and Thompson (1984). The 
first step in computing the Rd is to sum the squared structure 
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coefficients (r^ 2 ) on a function. That result is then divided by 
the number of variables in the set, to compute an average r^ 2 for 
a given variable set on a given function. The resulting figure is 
called the "variate adequacy coefficient." The Rd is then obtained 
by simply multiplying the variate adequacy coefficient by the 
squared canonical correlation (Rc 2 ) for a given function. 

The Rd is meant as a summary statistic that can provide an 
"useful expression for the degrees of relationship between 
[observed scores on measured variable] batteries as displayed by 
the canonical model" (Cooley & Lohnes, 1971, p. 171) . Similarly, 
Miller noted that "the bimultivariate redundancy statistic R 2 ylx 
summarizes in a single value the proportion of total test battery 
variance that one set of measures (X) explains in another set (Y) " 
(1975, p. 233) . Muller argued for the use of the redundancy 
statistic by characterizing it as "the mean square loading of one 
set on a canonical variate of the other set" (1981, p. 141) and 
giving a mathematical basis for its derivation. Additional 
noteworthy arguments for the redundancy index are made by Gleason 
(1976) . 

Illustrative Example 

Table 1 presents a small heuristic data set analyzed using 
SPSS for illustration purposes. This small sample is obviously 
unrealistic, but serves as a manageable illustrative tool. This 
example includes two sets of measured variables. The first set of 
two variables (i.e., "critl" and "crit2") has been designated the 
criterion variables and the three measured variables (i.e., 
"predl , " "pred2 , "and "pred3") in the second set have been 
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designated the predictor variables. Once again, this ordering of 
the variable sets is arbitrary and of little importance in 
canonical correlation analysis, but the designation will be helpful 
when illustrating the redundancy results. 

INSERT TABLE 1 ABOUT HERE. 

With SPSS for Windows, canonical analysis is accomplished 
using the MANOVA procedure. The relevant command syntax for this 
example would be: 

MANOVA critl crit2 WITH predl pred2 pred3/ 

PRINT=SIGNIF (MULTIV EIGEN DIMENR) / 

DISCRIM (STAN COR ALPHA (. 999) ) / DESIGN . 

Although the canonical correlation analysis in SPSS yields some 
noteworthy results that are beyond the scope of this paper, Table 
2 presents the summary statistics from this heuristic analysis that 
are relevant to the present discussion. SPSS labels the adequacy 
coefficients for the criterion variables under the subheading "Pet 
Var DE" as part of the results labeled "Variance in dependent 
variables explained by canonical variables." Likewise, the 
adequacy coefficients for the predictor variables are given under 
the subheading "Pet Var CO" as part of the results labeled 
"Variance in covariates explained by canonical variables." The Rd 
is then computed by multiplying the Rc 2 by the respective adequacy 
coefficient. 



INSERT TABLE 2 ABOUT HERE. 



"Pooled" Redundancy coefficients across the canonical 
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Redundancy Coefficients -9- 
f unctions can be computed by summing the Rd coefficients for a 
given variable set. For this data set the pooled redundancy for the 
criterion set given the predictor set is 89.83%. And the pooled 
redundancy for the predictor set given the criterion set is 88.37%. 
Although these two values are close to equal, as indicated 
previously, none of the redundancy coefficient results (i.e., even 
the "pooled" redundancy coefficients) are necessarily symmetric. 

Problems with the Redundancy Coefficient 
Rd Coefficients Are Not Multivariate 

Although the conceptualization of the redundancy coefficient 
was initially greeted with great enthusiasm (cf. Cooley & Lohnes, 
1971) , researchers eventually realized that the redundancy 
coefficient is not truly multivariate. Cramer and Nicewander said 
that the redundancy coefficient is "not multivariate in the strict 
sense because it is unaffected by the intercorrelations of the 
variables being predicted" (1979, p. 43). The Rd statistic can 
only be considered multivariate in that it involves the use of 
several measured variables; this is not the common definition of 
"multivariate. " 

As a means of illustration, five univariate multiple 
regression analyses were performed with the illustrative Table 1 
data. The first two regressions used all three of the predictor 
variables to predict both of the criterion variables, separately. 
The second three regressions used both criterion variables to 
predict each of the predictor variables, separately. The results 
from these five univariate multiple regression analyses are 
presented in Table 3 . 
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INSERT TABLE 3 ABOUT HERE. 

From Table 3 , note that the average multiple R 2 for the 
criterion variables is .898, or 89.8%. This result is exactly the 
same value as the pooled redundancy coefficient for the criterion 
variable set. Likewise, the average multiple R 2 for the predictor 
variables is .8837, or 88.37%. Again, this result is also exactly 
the same value as the pooled redundancy coefficient for the 
predictor variable set. The redundancy coefficient can now be 
defined as the "average squared multiple correlation for predicting 
variables in one set from the variables in the other set; 
consequently, redundancy . ..is synonymous with average [univariate] 
predictability" (Cramer & Nicewander, 1979) , and therefore 
obviously Rd coefficients are not truly multivariate statistics. 
Rd Coefficients are Not Optimized as Part of CCA 

In canonical correlation analysis, Rc 2 is optimized, not Rd! 
As Thompson (1991) noted, "it is contradictory to routinely employ 
an analysis that uses functions coefficients to optimize Rc, and 
then to interpret results (Rd) not optimized as part of the 
analysis" (p. 89) . 

If the goal of the analysis is to optimize Rd, then CCA is not 
the appropriate analysis. Instead, in such cases redundancy 
analysis should be employed (cf. Tyler, 1982; DeSarbo, 1981; van 
den Wollenberg, 1977) . When redundancy analysis is conducted, as 
against CCA, in this case the interpretation of redundancy 
coefficients makes more sense. 

Conclusions 
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These arguments against the use of the redundancy coefficient 
may have the researcher wondering whether the Rd is ever of any 
value. The answer is "yes, " but only in rare cases. It would makes 
sense to interpret the redundancy coefficients when the "synthetic 
variables for the function represent all the variance of every 
variable in the set, and the squared Rc also exactly equals 1" 
(Thompson, 1991, p. 89). This would be the case in a concurrent 
validity study where both variable sets consist of the same or 
similar measured variables and "g" (or general) functions are 
expected (cf . Sexton, McLean, Boyd, Thompson & McCormick, 1988) . 

Researchers should use caution when consulting the Rd within 
the context of a canonical correlation analysis. Thompson (1984) 
noted that "the statistic seems to make more sense in the context 
of redundancy analysis or some variant of redundancy analysis" (p. 
30) that is designed to optimize Rd. If the researcher's primary 
interest is to explore relationships between the synthetic variable 
sets, then canonical correlation analysis should be used, and the 
interpretation should focus on the multivariate variance-accounted- 
for effect size (Rc 2 ) , the standardized function coefficients, and 
the structure coefficients (rs) . 
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Table 1 

Heuristic Data Set 



Criterion 

Variables 




Predictor 

Variables 




critl 


crit2 


oredl 


pred2 pred3 


1 


9 


5 


5 


4 


1 


9 


5 


5 


6 


1 


9 


7 


6 


6 


1 


9 


7 


7 


7 


3 


10 


9 


9 


10 


3 


11 


9 


10 


9 


3 


11 


10 


10 


10 


3 


12 


11 


11 


11 



Table 2 

Canonical Summary Statistics 



Variable/ 

Statistic 


Function 


I Coefficients 


Function 


II Coefficients 


h2 


Function 


r s 


r S 2 


Function 


rS 


rS2 


critl 


0.564 


0.978 


95.67% 


-2.164 


-0.207 


4.29% 


100.0% 


crit2 


0.464 


0.968 


93.70% 


2.187 


0.252 


6.34% 


100.0% 


Adequacy 






94.69% 






5.31% 




Rd 






88.81% 






1.01% 




Rc2 






93.80% 






19.10% 




Rd 






87.99% 






0.33% 




Adequacy 






93.81% 






1.74% 




predl 


-0.385 


0.958 


91.78% 


3.045 


0.110 


1.21% 


93.0% 


pred2 


1.274 


0.997 


99.40% 


0.277 


0.049 


0.24% 


99.6% 


pred3 


0.104 


0.950 


90.25% 


-3.361 


-0.194 


3.76% 


94.0% 
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raDie ^ 

Multiple Regressions for Criterion Variables with Predictor Set 
ana Predictor Variables with Criterion Set 



Regression Model./. 
Statistic 


R 


R 2 


cr itl WITH predl, pred2 , pred3 
crit2 WITH predl, pred2 , pred2 


.952 

.944 


.906 
. 890 


Mean R 2 
Pooled Rd 




.898 

.898 


Pooled Rd 
Mean R 2 




. 884 
. 884 


predl WITH critl, crit2 
pred2 WITH critl, crit2 
pred3 WITH critl, crit2 


. 929 
.966 
.924 


.864 

.933 

.854 
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